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Abstract 

An efficient 0{N) cluster Monte Carlo method for Ising models with long-range 
interactions is presented. Our novel algorithm does not introduce any cutoff for in- 
teraction range and thus it strictly fulfills the detailed balance. The realized stochas- 
tic dynamics is equivalent to that of the conventional Swendsen-Wang algorithm, 
which requires 0{N'^) operations per Monte Carlo sweep if applied to long-range 
interacting models. In addition, it is shown that the total energy and the specific 
heat can also be measured in 0{N) time. We demonstrate the efficiency of our algo- 
rithm over the conventional method and the 0(A^log A^) algorithm by Luijten and 
Blote. We also apply our algorithm to the classical and quantum Ising chains with 
inverse-square ferromagnetic interactions, and confirm in a high accuracy that a 
Kosterlitz-Thouless phase transition, associated with a universal jump in the mag- 
netization, occurs in both cases. 
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1 Introduction 



Systems with long-range interactions exhibit more involved phase diagrams 
and richer critical phenomena than those with only nearest-neighbor interac- 
tions. One of the most prominent examples is the Ising model with long-range 
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interactions, whose Hamiltonian is defined as 

where erf = ±1, Jjj is the couphng constant between the ith. and jth sites 
(z, j = 1, 2, ■ ■ ■ , A^), and is the total number of spins. Among the models 
described by Eq. ([1]), the one-dimensional chain with algebraically decaying 
interactions has been studied most intensely so far. The interaction Jij for the 
model is written as ^ 

where a is the parameter characterizing the range of interaction. (Note that a 
should be positive to assure the energy convergence.) In spite of the extremely 
simple form of its Hamiltonian, the model is known to exhibit various critical 
behavior with respect to the parameter a: When a is sufficiently large, the sys- 
tem belongs to the same universality class as the nearest- neighbor model, i.e., 
no finite-temperature phase transitions [T][2] . At a = 1, however, the system 
exhibits a Kosterlitz-Thouless phase transition at a finite temperature [3|l^|f5] . 
In the regime 1/2 < a < 1 , the critical exponents of the system changes con- 
tinuously as a is decreased [B]. Finally, when a is equal to or smaller than 
1/2, the system shows the critical exponents of the mean-field universality [6j. 
Such rich and nontrivial phenomena associated with the long-range interac- 
tions have attracted much interest and many researches have been done both 
theoretically and numerically. 

On numerical researches of long-range interacting spin models, however, the 
standard Monte Carlo techniques encounter a serious problem, i.e., quadratic 
increase of CPU time per Monte Carlo sweep as the system size increases. This 
is simply because there are ArC2 ~ A^^ different pairs of spins to be consid- 
ered in an A^-spin system. For unfrustrated spin models, the cluster methods, 
such as the Swendsen-Wang [7j or Wolff [S] algorithms, are the methods of 
choice, since they almost completely eliminate correlations between succeeding 
spin configurations on the Markov chain. Unfortunately, the cluster algorithms 
share the same difficulty with the single spin fiip update. In 1995, however, 
Luijten and Blote introduced a very efficient cluster algorithm [H]. What they 
focused was that, on average, only 0{N) among 0(A^^) bonds contribute to 
cluster construction. By employing a rejection-free method based on binary 
search on a cumulative probabilities, they succeeded in reducing the num- 
ber of operations per Monte Carlo sweep drastically to O(A^logA^). Recently, 
the same strategy has been applied to the quantum Monte Carlo method for 
long-range ferromagnetic Ising models in a transverse external field [TU] . 

In the present paper, we propose a still faster cluster Monte Carlo algo- 
rithm for long-range interacting ferromagnets. Our method is based on the 
extended Fortuin-Kasteleyn representation of partition function [TT][T^ and 



2 



an extremely effective technique for integral random number generation, so- 
called Walker's method of alias p^lIT^ . As the method of Luijten and Blote, 
the proposed algorithm does not introduce any cutoff for interaction range and 
realizes the identical stochastic dynamics with the original 0{N'^) Swendsen- 
Wang method. The CPU time per Monte Carlo sweep is, on the other hand, 
merely proportional to N instead of log or A^^ , and is an order of magni- 
tude shorter than that of Luijten and Blote for sufficiently large systems. In 
addition to its speed, our algorithm has several advantages: First, it is quite 
robust, that is, it works efficiently both for short-range and long-range inter- 
acting models as it stands. Second, the calculation of the total energy and the 
specific heat are also possible in 0{N) time without any extra cost. Third, 
our Monte Carlo algorithm is straightforwardly extended for quantum models, 
such as the transverse-field Ising model, the Heisenberg model, etc. 

The organization of the present paper is as follows: In Sec. 2, we briefly review 
the Swendsen-Wang cluster algorithm and its O(A^logA^) variant by Luijten 
and Blote. In Sec. 3, we present our new algorithm in detail. We also show 
how the total energy and the specific heat are calculated in 0{N) time, and 
the extension of the 0{N) algorithm to the transverse- field Ising model. In 
Sec. 4, a benchmark test of our new algorithm is presented. As an application 
of the 0{N) algorithm, the Kosterlitz-Thouless transition of the Ising chain 
with inverse-square interaction is investigated in Sec. 5. Especially, we confirm 
the universality between the classical and quantum Ising models in a high 
accuracy. Section 6 includes a summary and discussion, followed by appendices 
on some technical details about Walker's method of alias. 



2 Conventional Cluster Algorithms for Ising model with Long- 
range Interactions 

2. 1 Swendsen- Wang Method 

In this section, first we briefly review the cluster Monte Carlo method by 
Swendsen and Wang [7]. Each Monte Carlo sweep of the Swendsen-Wang 
algorithm consists of two procedures, graph assignment and cluster flip. In 
the former procedure, one inspects all the bonds sequentially, and each bond 
is activated or deactivated with probability 

Pi, =5,.,,.[l-exp(-2/3Ji,)] (3) 

and (1 — Pij), respectively. Then, after the trials for all the bonds, each cluster 
of spins connected by active bonds is flipped at once with probability 1/2, and 
a terminate configuration is generated. 
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The stochastic process achieved by the Swendsen-Wang algorithm is ergodic. 
It is also proved, with the help of the Fortuin-Kasteleyn representation of the 
partition function [TTP^ . that the algorithm satisfies the detailed balance. 
The partition function of the Hamiltonian ([T]) is written as 

Z = J2Il e^-^'^'^'^'^I = E n e^'''"'' (4) 

c i<j c e=i 

where A^b (= NC2) denotes the total number of bonds, £ is the bond index, 
and Je = Jij and cr^ = a^aj are the coupling constant and the product of the 
spin states of the both ends of the bond i, respectively. We first extend the 
original phase space of Ising spins {c} (= {{0-1,0-2, ■■ ■ ,cr%)}) to the direct 
product of phase spaces of spins {c} and graphs {g}. A graph g is defined by 
a set of variables ge {i = 1,2, ■ ■ ■ , A^b), each of which is defined on each bond 
(or link). The graph variable ge describes whether the ith bond is activated 
{gi = 1) or not {gi = 0). By using the extended phase space, the partition 
function (jl]) is expressed as 

^ = cEE^(c,^?) (5) 

c g 

with 

^ic,g) = Y[ A{oi,gi) Vi{gi), (6) 
e=i 

where C is a constant and the summation J2g runs over 2^'^ possible graph 
configurations. The weight functions A and Vi are defined as 

.0 ii Of = 1 and op = —1 
A{cTe,9e) = { (7) 
1 otherwise 

Vf{g,) = (e^^^^ - ir, (8) 

respectively. The equality between Eqs. (jl]) and (jSD is verified by figuring out 
the summation with respect to g in the latter: 

Nb Nb 

J2^{c,9) = J2E^i^i^9e)Vi{gi) = Y[{A{oe,0)Ve{0) + A{oe,l)Ve{l)) 
g g e=i 1=1 

Nb 

= n(i+^^.i(e'''^-i)) (9) 
1=1 

Nb Nb 

£=1 t=i 

and thus C = exp(— /5 J2e -h)- From Eq. ([5]), we can consider uj{c, g) as a weight 
of the configuration (c, g) in the extended phase space. 
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The Swendsen-Wang method is a procedure to update the spin configuration 
dynamically by going through an intermediate graph configuration. Consider 
that the initial spin configuration is c. We assign a graph g for the spin con- 
figuration c with the following probability: 



}_^u{c,g) ^^i}^Vi{gi)A{ae,gi) 

That is, for each bond we assign g^ = 1 with probability 

PiM = H",) = 1) ^ = ^,,,1(1 - e ). (11) 

This probability turns out to be the same as the one in Eq. ([3]). The cluster 
construction in the Swendsen-Wang algorithm is thus equivalent to assign- 
ing graph variables in the Fortuin-Kasteleyn language. The second procedure 
(cluster fiip) in the Swendsen-Wang algorithm is also represented clearly in 
the Fortuin-Kasteleyn representation: Under a given graph configuration g, a 
new spin configuration c' is selected according to probability 



P{c'\9) = = . (12) 

j:ii^i^i^9i) 

c 1=1 



Note that JJ^^i ^{o'e, 9e) takes either 1 or 0, depending on whether all the 
active bonds have ct£ = 1, or not. The last equation means that among all 
the allowed spin configurations, which have nonzero uj{c,g), for a given g, a 
configuration is chosen with equal probability. This is equivalent to fiipping 
each cluster of spins connected by active bonds independently with probability 
1/2. 

The detailed balance condition of the Swendsen-Wang method is thus repre- 
sented in a concrete form: 



P(c'|c)c.(c) = j:Pic'\9)Pi9\c)coic) = ^i^^MM^. (13) 

9 9 

Since the most right-hand side of Eq. (fT3|l is symmetric under the exchange 
of initial and terminal spin states c and c', the detailed balance is satisfied 
automatically. 
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2.2 0{N\ogN) Method by Luijten and Blote 



It has turned out that the Swendsen-Wang cluster algorithm works quite well 
for wide variety of systems without frustration. Especially, it removes almost 
completely the so-called critical slowing down near the continuous phase tran- 
sition point. Since there is no constraint about the range of interactions in its 
construction, the Swendsen-Wang algorithm is also applicable to long-range 
interacting systems without any modification. However, since the number of 
bonds iVb is = 0{N^) in such systems, the number of operations required 
for one Monte Carlo sweep is proportional to A^^, which is significantly more 
expensive than those for the nearest-neighbor models. 

A nifty solution for reducing drastically the number of operations from 0{N'^) 
to O(iVlogA^) was devised by Luijten and Blote [9]. What they noticed is 
separating the activation probability Pf, into the two parts: 

Pi=PiK,i (14) 
= l-exp(-2/3J,). (15) 

If one chooses candidate bonds with probability pi and then activate them 
with probability ^^f,! afterward, the probability is realized eventually. For 
choosing the candidates bonds, one could use a more efficient method than 
the exhaustive search, since is independent of the spin state cr^ and prede- 
termined statically at the beginning of the Monte Carlo simulation. Indeed, it 
is seen that the number of candidate bonds are typically much smaller than 
A^b- The average number of candidate bonds is evaluated as 

= ^ E E(l - e-'''-) - ^ E / dr r'-\l - e-^^«) 
PN J dr r'^-^J{r). 

Here we assume the translational invariance and that Jij depends only on the 
distance, i.e., Jij = J{rij). If J(r) decays faster than r~°', which is equivalent to 
the condition of energy convergence for the ferromagnetic models, the integral 
in the last expression converges to a finite value in the thermodynamic limit. 
Thus, at a fixed temperature, the number of candidate bonds increases as A^ 
instead of A^^. 

For choosing candidate bonds, Luijten and Blote adopted a kind of rejection- 
free method, which is based on the binary search of cumulative probability 
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tables. Let us define 

q^^^= PmUil-pt) {q?^=Pi) (17) 
e=i 

m 

^^i°^=E^r = and <Vi = 1) (18) 

e=i 

where q^^^ is the probabihty that the mth bond is eventually chosen as a 
candidate after the failure for the first, second, • • • , and (m — l)th bonds, 
and C^^ is the cumulative probability of ql^\ When an uniform real random 
variable U (g [0, 1)) is generated, U satisfies C^li <U< with probability 
q'^ . The first candidate bond m can then be directly chosen by searching the 
first element larger than U . After the mth bond is activated or deactivated 
depending on its spin state cr^, one can continue the same procedure using 
the tables 

n-l 



= pn n (1 - P^) (ill = (19) 

n 

Ci-^= E (CM = and cS^i, = 1). (20) 

e=m+i 

In practice, one does not have to prepare for all m's, since is readily 
expressed in terms of C^^ and as 

ci-^ - %{c^^^ - cj^^ (21) 

qm 

In other words, comparing to a random number U {U E [0, 1)) is equiv- 
alent to comparing C^°) to C^) + (g^VPm)t^- 

Besides an initial table setup, which requires O(A^b) operations, searching an 
element in the table can be performed very quickly by using the binary search 
algorithm. The number of operations required for each search is O(logA^b) = 
O(logA^), which is significantly smaller than O(A^b) for the naive sequential 
search. Since the average number of candidate bonds is 0{N), a whole Monte 
Carlo sweep is accomplished by 0{N log N) operations on average. 



3 New O(A^) Cluster Algorithm 

3.1 Formulation of 0{N) Method 



The factor logA^ in the method of Luijten and Blote is due to the fact that 
they use the binary search algorithm in looking for a next candidate. This 
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factor might be removed if one can use some 0(1) method instead of the bi- 
nary search. Walker's method of ahas [T3p^ has been known as such an 0(1) 
method to generate integral random numbers according to arbitrary probabil- 
ity distribution for a long time (Appendix A), and is a potential candidate for 
the replacement. Unfortunately, for the Walker method one can not use the 
smart trick presented in Eq. fl2Tl) for reducing the number of tables. It means 
that one has to prepare a table of length O(iVb) for each m before starting 
the simulation. The total amount of memory storage for storing all the tables 
is thus 0{N^) = 0(A^^), which is not acceptable in practice. In the follow- 
ing, we present a different approach based on the extended Fortuin-Kasteleyn 
representation, which solves the storage problem and enables us to use the 
efficient 0(1) method by Walker with reasonable storage requirement, O(A^b) 
(or 0{N) for systems with translational invariance). 

Our central idea is assigning a nonnegative integer to each bond instead of a 
binary (active or inactive). The integer to be assigned is generated according 
to the Poisson distribution. The probability that a Poisson variable takes an 
integer k is given by 

/(^;A) = ^p, (22) 
where A is the mean of the distribution. Note that /(O; A) = e"'*' and therefore 

oo 

Y.f{k-X) = l-e-\ (23) 
k=i 

which is equal to pe in Eq. ( fT5i) . if one puts A to be 2/3 J^. That is, if one 
generates an integer according to the Poisson distribution with A = 2/3 J^, it 
will take a nonzero value with probability pe. Thus conventional procedure in 
activating bonds in the Swendsen-Wang algorithm can be modified as follows: 
Generate a Poisson variable for each bond with a mean 2/3J^, then activate 
the bond only when the variable is nonzero and the spins are parallel. At first 
glance, it seems that the situation is getting worse, since a Poisson random 
number, instead of a binary, is needed for each bond. At this point, however, we 
leverage an important property of the Poisson distribution: the Poisson process 
is that for random events and there is no statistical correlation between each 
two events. It allows us to realize the whole distribution by calculating just 
one Poisson random variable with the the mean Atot = J2e -^e- The following 
identity clearly represents the essence: 

n fih; A,) = / (fc,ot; Atot) , ,,^7" , ■ 11 ^ (24) 

where fctot = J^e^i. This identity is verified in a straightforward way by sub- 
stituting Eq. (!22|) in both hands. The left-hand side of Eq. is the prob- 
ability that ki is assigned to each bond. The right-hand side, on the other 
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hand, stands for the probability of generating a single Poisson number /ctot 
and then distributing events to each bond with the weight proportional 
to A^. Distributing each event can be carried out in a constant time using 
Walker's method of ahas. Since generating a Poisson number with the mean 
Atot takes only O(Atot) time on average, the number of operations of the whole 
procedure is also proportional to Atot = '^PJZiJe, which is 0{N) for energy 
converging models. 

Before closing this section, let us describe our 0{N) algorithm in terms of an 
extended Fortuin-Kasteleyn representation. Introducing a configuration k — 
{ki, k2, - ■ ■ , ki\f^) instead oi g — {gi,g2, • ■ ■ , gN^,) in the original representation, 
the partition function is expressed as 



JVb iVb oo 

^ = E E n ^(^^ hmh) = E n E ^('^^ hmh) (25) 



c k e=i 



c i=l ki=0 



with 



A((7, k) = < 



Vi{k) 



if A; > 1 and a = -1 

1 otherwise 

A;! 



(26) 



(27) 



The original partition function is easily recovered by performing the summa- 
tion over kes first. 



3.2 Procedure in Concrete 



One Monte Carlo sweep of the 0{N) algorithm is described as follows. 

(1) Generate a nonnegative integer k according to the Poisson distribution 
with the mean Atot- 

(2) Repeat the following procedure k times: 
(2-a) Choose a bond £ with probability 

(28) 



by using Walker's method of alias. 
(2-b) If ai = 1 then activate bond i. If the bond is already activated, just 
do nothing. 
(3) Fhp each cluster with probabihty 1/2. 
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For a system with translational invariance, step (2-a) in the above procedure 
can be replaced by 



(2-a') Choose a site i with probabihty 1/N, then choose another site j with 
probabihty 

^Aj-- (29) 

In this way, the size of tables for the modified probability and alias number 
in the Walker method (see Appendices A and B for details) can be reduced 
from 0{N^) down to 0{N). 



3.3 Total Energy and Specific Heat Measurement 



Measuring the total energy is also costly for long-range interacting models. 
In Ref. |15], Krech and Luijten proposed a method based on the fast Fourier 
transform. In this section, however, we show that the total energy and the 
specific heat are also calculated in 0{N) time in the present algorithm. Indeed, 
the both quantities are obtained free of charge during Monte Carlo sweeps. 



Let us consider the expression for the energy in the extended Fortuin-Kasteleyn 
representation. Differentiating the partition function (l25l) with respect to the 
inverse temperature, we obtain 



E 



dp 



tot 



c k e 



(30) 



P \ f / MC 



where Jtot = J2eJe, and (■ ■ •)mc denotes the Monte Carlo average of an ob- 
servable in the present 0{N) algorithm. Thus, in order to calculate the to- 
tal energy, nothing more than the information one uses during Monte Carlo 
sweeps is needed. It also applies to the the specific heat. Differentiating the 
right-hand side of Eq. (130|) once again, one obtains the following expression 



/32 dE 
'N~dp 



N 



1 



1 

iV 



MC 



MC 



Eh 

e 



1 ^ 2 



MC 



E^^ 



MC 



MC 



'tot 



(3 



(31) 
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for the specific lieat, which is not simply a variance of of energy (130|) but has an 
extra term {J2eke)MC- We note that the expressions for the total energy (!30!) 
and the specific heat fl3ip have a close relation with those for the quantum 
Monte Carlo method in the continuous imaginary-time path integral or the 
high-temperature series representations |16j . 



3.4 Quantum Cluster Algorithm for transverse-field Ising Model 



The 0{N) Monte Carlo algorithm can be extended quite naturally to quantum 
spin systems with long-range interactions. In this section, as a simplest exam- 
ple, we present a quantum cluster algorithm for the long-range Ising model in 
a transverse external field. Application to other quantum spin models, such 
as the Heisenberg or the XY models, is also straightforward. 

The Hamiltonian of the transverse-field Ising model with long-range interac- 
tions is defined as 

N 

^ = -E^^.^r^|-Er^^ (32) 

i<j i=l 

where F denotes the strength of transverse external field, and af and cxf are 
the Pauli operators at site i. According to the standard prescription pTj, we 
start with dividing the Hamiltonian fl52]) into two parts, bond terms Tib = 
~Y.i<jJij'^i<^j and site terms Hs = — Z^iliTcrf. The partition function is 
then expanded as 



Z = Tre"^^ = lim TUil (e 



e « 



lim 



M 

E n 

... m=l 



s Ml 
) 

m+l), 



(33) 



where M is the number of Suzuki- Trotter slices along the imaginary-time axis. 
In Eq. f l55]) . we inserted the identities \4>m){(pm\ between the operators. 
The basis set {4>m} is chosen so that {erf} are diagonahzed (and so is TCh), 
and Em = {(pm\'Hh\(pm) ■ We impose the periodic boundary conditions in the 
imaginary-time direction: 0m+i = 4>i- Expanding the exponential operators 
of the site Hamiltonian to the first order, we obtain the following discrete 
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imaginary-time path integral: 



M 
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Z = lim TT e M 

(Air-- ,<I>M ™=J- 
M 



N 

n 

i=l 



(m) (m) 

<f>l,---,^M "1=1 '-Kj 



n r N 



ilni3r^(™)^('"+i) 



1=1 



r M n J 

Chrn^ ^ exp 



m=l i<j 



M 



M N 



I \ ^ \ ^ 1 1 (m+l) 

+ > > - In— cr- V- 

m=l 1=1 ^ ^'^ 



(34) 



where cxf = (af ± «crf)/2 are the spin ladder operators, a,-™''' = (^mlal'l^m), 
and C is a constant. Thus, the partition function of the transverse- field Ising 
chain of sites is represented by that of a two-dimensional classical Ising 
model of M X sites, where the interactions are long-ranged along one axis 
(real space direction) and short-ranged along the other axis (imaginary-time 
direction). The coupling constants in both directions are f^dJij^^^^^ = f3Jij/M 
and /^ciJ^*'""") = iln(/3r/M), respectively, where /3d is a fictitious inverse tem- 
perature of the mapped system. The 0{N) cluster algorithm presented in the 
previous subsection is then applied to this classical Ising model straightfor- 
wardly. 

Furthermore, it has been shown that one can take the Trotter limit (M — > oo) 
in Eq. flMl) . and perform Monte Carlo simulations directly in the imaginary- 
time continuum [T81IT9] . It is possible because the coupling constant along 
the imaginary-time axis increases as M does. The average number of 

antiparallel pairs (or kinks) remains finite even in the continuous-time limit, 
and therefore one does not have to take configurations with infinite number 
of kinks into account. Specifying the number of kinks by n and its space- 
time position by (jp, Sp) (p = 1, 2, ■ ■ ■ , n), we obtain the continuous-time path 
integral representation of the partition function: 



E 



-l3Eo 



+ EE 



1,(3 rf} ri+1 



, (35) 



where tq = 0, r„+i = /?, and Ep is the diagonal energy 
configuration between the pth and {p + l)th kinks {En = 
example of path integral configuration is shown. 



[(f)\Hh\(f)) of the spin 
Eo). In Fig.[T](a) an 



The cluster algorithm is also defined directly in the Trotter limit. Since the ac- 
tivation probability of temporal bond with parallel spins, 1— exp(2/5ci J*^**™"*^^) = 
1 — /3r/M, becomes almost unity for M ^ 1, the probability of finding n in- 
active links (open squares in Fig. [1]) in a uniform temporal segment of unit 
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(a) 



(b) 



■X- 



C 

'5b 



■X-i 



i - 



-X- 



r? 



■X- 







J' 



Fig. 1. (a) Example of the space-time configuration in the continuous-time path 
integral representation. The arrows at the bottom denote (pQ. Solid and broken lines 
denote the continuously aligned up and down spins, respectively. The open circles 
represent the space-time position where an ladder operator is inserted, (b) Possible 
graph configuration assigned to spin configuration (a). The open squares represent 
the positions where the temporal bond is deactivated, while the filled circle represent 
those where a spatial bond is activated. The spatial long-range bonds are activated 
only when the spins are parallel at both ends as depicted in (a), where the candidates 
connecting antiparallel spins are rejected (x-marks). 

imaginary time, which contains M/ (3 Trotter shces, is given by a Poisson dis- 
tribution, 

M/pCn{i - /3r/M)(^^/^-")(/3r/M)" ^ /(n, r). (36) 

Similarly, the probabihty of finding n spatial candidate links (horizontal dashed 
lines in Fig. [1]) between parallel spins at site i and j in unit imaginary time 
is f{n,2Jij). After all, the overall probability of finding n events in total at 
some site or bond is given by f{n,A) with 

A = A^r + 2Jtot. (37) 

Since these events are statistically independent with each other, a series of 
events is generated successively by using the exponential distribution for the 
temporal interval t between two events: 

p{t)dt = Ae-^^dt. (38) 

At each imaginary time, then a site or bond is chosen according to the prob- 
abilities r/A or 2Jij/A, respectively. This is again done in a constant time by 
using the Walker method. If a site is chosen, the temporal bond is deactivated, 
i.e., clusters are disconnected at this space-time position. If a bond is selected 
(and if the spins on its ends are parallel), on the other hand, a spatial link is 
inserted, i.e., two sites are connected at this imaginary time (horizontal solid 
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lines in Fig. [T]). At the space-time points where the spin changes its direction 
(open circles in Fig. [1]), we always deactivate the temporal bond. By repeat- 
ing this procedure until the imaginary time j3 is reached, the whole lattice is 
divided into several clusters [Fig. [11(b)]. Finally each cluster is flipped with 
probability 1/2 to generate a terminal conflguration. 

The number of operations per Monte Carlo sweep is proportional to the num- 
ber of generated events. Its average is given by /3A, which is proportional to the 
system size A^ as the 0{N) algorithm for classical models. We note that the 
0{N) quantum cluster algorithm presented in this section is also formulated 
in the same way in the high-temperature series representation [TO] . 



4 Performance Test 

In order to demonstrate the efficiency of the present method, we carried out 
Monte Carlo simulations for the classical mean-field (or infinite-range) model 
of various system sizes (A^ = 2, 4, ■ ■ • , 2^^). We use the naive Swendsen-Wang 
and Luijten-Blote methods as benchmarks. The coupling constants of the 
mean-field model is given by 

J'^j = ^ (39) 

for all i j- The denominator A^ is introduced to prevent the energy density of 
the system from diverging in the thermodynamic limit. We choose the mean- 
field model as a severest test case for these algorithms, though simpler and 
faster algorithms, even exact analytic results, exist for this specific model. The 
benchmark test was performed on a PC workstation (CentOS Linux 5.1, Intel 
Xeon 3.2GHz, 1MB cache, GNU C++ 4.1.1). 

We confirm that all these algorithms produce the same results, total energy, 
specific heat, magnetization density squared. Binder cumulant, etc, within the 
error bar in the whole temperature range we simulated. The CPU time spent 
for one Monte Carlo sweep at the critical temperature (T = 1) is shown in 
Fig. [2J For the naive Swendsen-Wang algorithm, as one expects, the CPU 
time grows rapidly as A^^. On the other hand, it is clearly seen in Fig. [2] that 
the present algorithm has a different scaling, linear to the system size, and 
is indeed much faster than the Swendsen-Wang method except for very small 
system sizes (A^ < 4). The present and Luijten-Blote methods exhibit a similar 
scaling behavior, but the former is faster for all the system sizes we simulated. 
To see the difference in scaling behavior in detail, we plot the relative speed 
of the present algorithm to the latter in the inset of Fig. [2l For A^ < 10^, it 
scales as logA^, which is consistent with the performance difference between 
the Walker and the binary search algorithms. Around the system size A^ ^ 10^, 
however, the results for the present algorithm start to deviate from the A^- 
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1 10 10^ lo"* 10* lo"^ lO*" lo' 
N 



Fig. 2. System size dependence of CPU time per Monte Carlo sweep of the Swend- 
sen-Wang (squares), Luijten-Blote (circles), and the present (diamonds) methods 
for the mean- field model at T = 1. The solid and dashed lines indicate the theoret- 
ical asymptotic scaling for the Swendsen-Wang (A^^) and the present (A^) methods, 
respectively. The Luijten-Blote and the present methods both show an anomaly at 
N ~ 10^, which is attributed to the occurrence of cache miss. In the inset, the 
relative speed of the present algorithm to that of the Luijten-Blote method is also 
shown, where the dashed line indicates a log scaling. 

linear scaling. Those for the Luijten-Blote method shows a similar anomalous 
behavior, but the situation is much worse in this case as seen in the inset 
of Fig. [21 We attribute these anomalies to the occurrence of cache miss, for 
the spin configuration of > 10^ does not fit the cache memory, whose size 
is typically a few MB. The naive Swendsen-Wang method should also suffer 
from the same problem, but in the present benchmark test its effect seems to 
be hidden under the quadratic growth in the number of operations. 

In summary, among the existing three algorithms the present 0{N) method 
is the fastest except for very small system sizes. Especially, it outperforms 
the Swendsen-Wang method by four orders of magnitude at A^ = 2^^ and the 
Luijten-Blote method by about factor twenty at A^ = 2^^. This efficiency of the 
present method enables us to simulate much larger systems or further improve 
statistics as compared with the previous Monte Carlo studies, as demonstrated 
in the next section. 



5 Kosterlitz-Thouless Transition in Ising Chain with Inverse-square 
Interactions 

In this section, as a nontrivial example, we apply our 0{N) cluster algorithm 
to the phase transition of the one- dimensional Ising model with inverse-square 
interactions [Eq. ([21) with a = 1]. As we mentioned in the introduction, among 
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Fig. 3. Temperature dependence of magnetization density squared for (a) classical 
(r = 0) and (b) quantum (T = 1) Ising chains with inverse-square interactions. 
System sizes are L = 2^, 2^, • • • , 2^*^ from the top to the bottom. The error bar 
of each data point is much smaller than the symbol size. The dashed lines denote 
the universal jump relation Eq. ()40p . The filled diamond indicates the critical tem- 
perature obtained from the finite-size scaling analysis (see Fig. [5] below) and the 
magnetization density squared just below the critical point. 

the models with algebraically decaying interactions, this model is special as 
a boundary case, i.e., it has the weakest (or shortest) interactions to trig- 
ger a finite-temperature phase transition. What is more, this phase transi- 
tion belongs to the same universality class as the Kosterlitz-Thouless tran- 
sition [3E1I5] . where logarithmic excitations brought by formation of domain 
walls compete with the entropy generation. The Kosterlitz-Thouless transition 
is characterized by an exponential divergence of the correlation length toward 
the critical temperature Tkt and a finite jump in the magnetization. Espe- 
cially, the amount of the magnetization gap at the critical point is conjectured 
to satisfy the following universal relation: 

2m2 = Tkt, (40) 

where = {{JliOiY) /N"^ ■, being the square of magnetization density. For 
the classical Ising chain with inverse-square interactions, it is confirmed that 
a phase transition of Kosterlitz-Thouless universality occurs by an extensive 
Monte Carlo study |20j. 

We perform Monte Carlo simulations by using the 0{N) cluster algorithm for 
the chain length L = 2^, 2"^, ■ ■ ■ , 2'^^{= 1048576). We impose periodic boundary 
conditions. In order to minimize the effect of boundary conditions, we use the 
following renormalized coupling constant 

~ _ ~ 1 _ 

■^'^ - 2^ ii-i- nLY ~ ~n o7r(z-j)' ^^^^ 
n=-oo y J > sin^ ^ — 
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Fig. 4. Scaling plot of magnetization density squared for (a) classical (F = 0) and 
(b) quantum (F = 1) Ising chains with inverse-square interactions. System sizes are 
L = 2^ (crosses), 2^*^ (x-marks), • • • , 2'^^ (filled diamonds). The error bar of each 
data point is much smaller than the symbol size. 

in which contribution from all periodic images is taken into account. It reduces 
to the bare coupling constant in the thermodynamic limit L — oo. Mea- 
surement of physical quantities is performed for 524288 Monte Carlo sweeps 
after discarding 8192 sweeps for thermalization. 

In Fig. [3|, we show the temperature dependence of the magnetization density 
squared for (a) F = and (b) F = 1. For the classical system (F = 0), 
our results coincide quite well with the previous Monte Carlo study [SO]- In 
both cases, decreases monotonically as the temperature increases. At high 
temperatures, vanishes quite rapidly as the system size increases, while it 
seems converging to a finite value in the low temperature regime though the 
convergence is rather slow. This suggests an emergence of long-range order 
at some finite critical temperature. At low temperatures, of the quantum 
system is smaller than the classical one. Indeed, < 1 even at T = for 
F = 1, which is in contrast to the classical case, = 1. This is due to quantum 
fluctuations introduced by the transverse external field. In a previous quantum 
Monte Carlo study [10], intersections of magnetization curves for different 
system sizes at intermediate temperatures have been reported. In the present 
study, however, we do not observe such a nonmonotonic behavior regardless of 
the system size. We would attribute this discrepancy to a relaxation problem 
in the Monte Carlo calculation in Ref. [lOj, where only a local flip scheme is 
used for updating spin configurations. 

In order to discuss the critical behavior in detail, next we perform a finite-size 
scaling analysis. As is well known, the standard finite-size scaling technique 
does not work in the case of the Kosterlitz-Thouless transition, for the cor- 
relation length exhibits an exponential divergence. Instead of the ordinary 
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finite-size scaling, which depends on an algebraic divergence of the correlation 
length, an alternative scaling form for the magnetization has been suggested 
from the renormalization group equations [5|21f22j : 



2m2 



(42) 



1 



T 



where F{x) is a scaling function, t = T/Tkt — 1, ^ = ^og{L/Lo), and Lq a 
constant. It is confirmed that this finite-size scaling assumption works well for 
the two-dimensional XY model [22j , in which the helicity modulus, instead of 
the magnetization, is the quantity exhibiting a universal jump. 

In Fig. m we show the scaling plots for F = and 1. Both data are scaled 
excellently by using the same scaling form (1421) . where we have only two fitting 
parameters, Tkt and Lq. This strongly supports that the magnetization shows 
the universal jump fl40p at the critical point. From these scaling plots we 
conclude 



for the Kosterlitz-Thouless critical temperature. The result for the classi- 
cal case is compared with that in the previous Monte Carlo study, Tkt = 
1.5263(4) [20], which differs slightly beyond the error bar. This tiny discrep- 
ancy might be due to the difference in the way of scaling analysis. In Ref. [20] 
the magnetization data at low temperatures are first extrapolated to the ther- 
modynamic limit, then further extrapolated towards the critical point, whereas 
the Monte Carlo data are directly used to estimate the critical temperature 
in the present finite-size scaling analysis. Thus, we expect that the present 
estimate for Tkt is more reliable. 

As for the quantum system (F = 1), the critical temperature is lower than 
the classical one due to the quantum fluctuations. However, the finite-scaling 
analysis confirms that the phase transition belongs to the Kosterlitz-Thouless 
universality class as in the classical case. The finite-size scaling plots shown 
in Fig. m suggest that the scaling function itself is universal as well. As F is 
increased further, Tkt decreases monotonically, and it finally vanishes at a crit- 
ical transverse field 2.52 |23j), where a quantum phase transition occurs. 
At this point some exotic quantum critical behavior with a nontrivial dynam- 
ical exponent z is expected, since the (1 + l)-dimensional system is extremely 
anisotropic, i.e., in real space direction the system has long-range interactions, 
whereas the interaction in the imaginary-time direction is still short ranged. 
More detailed analyses on the quantum criticality of the transverse-field Ising 
model will be presented elsewhere [25] . 




(43) 
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6 Summary and Discussion 



We presented an 0{N) cluster algorithm for Ising models with long-range in- 
teractions. The algorithm proposed in the present paper is exact, i.e., it does 
not introduce any cutoff for interaction range and thus it strictly fulfills the 
detailed balance. Our algorithm is formulated based on the extended Fortuin- 
Kasteleyn representation, where bond variables have a nonnegative integral 
value instead of a binary number. For each bond, an integer is generated ac- 
cording to the Poisson distribution. However, it does not necessarily mean 
that each Poisson variable has to be generated one by one. We show that gen- 
erating an overall Poisson distribution and ex-post assignment of events, using 
Walker's method of alias, are statistically equivalent to the naive Swendsen- 
Wang method. In Sec. 4, we demonstrated the iV-linear scaling behavior in 
the CPU time for the mean-field model. 

The present method has several advantages over the existing methods, such as 
the Metropolis method, the Swendsen-Wang algorithm [7j, the improvement 
by Luijten and Blote [9], or the recently proposed 0{N) method [21], in sev- 
eral aspects: (a) The CPU time per Monte Carlo sweep is 0{N). (b) It works 
effectively both for short-range and long-range interacting models, (c) It is 
a cluster algorithm and free from the critical slowing down near the critical 
point, (d) It is possible to formulate a single-cluster variant [8]. (e) It is very 
easy to implement the algorithm, based on an existing Swendsen-Wang code, 
(f) It can also be used for systems without translational invariance, though it 
once costs 0(A^^) to initialize lookup tables, (g) Calculation of the total energy 
and the specific heat can be done all together in 0{N) time, (h) It can be ap- 
plied to Potts, XY, and Heisenberg models with the help of Wolff's embedding 
technique [8]. (i) Extension to quantum models, such as the transverse-field 
Ising model or the Heisenberg model, is also possible straightforwardly. 

In Sec. 5, we have applied our new algorithm to the phase transition of Ising 
model with inverse square interactions, where we see that the 0{N) method 
works ideally for both of the classical and quantum systems. It is confirmed 
in a high accuracy that the phase transition belongs to the same universality 
as the Kosterlitz-Thouless transition. 

Finally, let us discuss the efficiency of the present algorithm at very low tem- 
peratures. For a fixed system size N, the calculation cost of the present method 
grows linearly as the inverse temperature f3 increases, whereas that of the naive 
method is constant regardless of the temperature for classical Ising models. It 
indicates that at lower temperatures than some threshold l//?thresh, the naive 
method outperforms the present method. At extremely low temperatures, al- 
most all the bonds are activated. The present method then activates such 
bonds many times, which is the cause of the slowing down. Although the /?- 
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linear increase of CPU time is inevitable for quantum systems, where the stan- 
dard quantum Monte Carlo algorithms for short-range models also suffer from 
the same slowing down, however, one can adopt a "hybrid" scheme to optimize 
the calculation cost at intermediate temperatures, max(j£) < /? < Ahresh for 
classical models. Suppose J^'s are sorted in descending order, and we use the 
naive method for the first n bonds and the 0{N) method for the others. The 
CPU time C{n) per Monte Carlo sweep is estimated as 



where A and B are some constants. The optimal value of n is then given by 
AC = C{n + 1) — C{n) = A — 2B(3Jn = 0. For the one-dimensional model 
with algebraically decaying interactions ([2]), for example, we have 



The threshold Ahresh is defined as the inverse temperature where riopt = — 
N"^ , that is, /^thresh ~ {A/2B)N^^°- , which grows as the system size N increases. 
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A Walker's Method of Alias 

Consider a random variable X which takes an integral value i according to a 
probability Pi {1 < i < N and = !)• In this appendix, we discuss how to 
generate such random numbers effectively. One of the simplest and the most 
well-known methods is the one based on rejection: 

Rejection Method 

(1) Generate a uniform integral random variable M (1 < M < N). 



C{n) ^An + B ^ 2/3J^, 



(44) 



l=n+l 




(45) 
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(2) Generate a uniform real random variable U {0 < U < 1). 

(3) If U is smaller than PM/Pmax then X = M, otherwise repeat from (1). 

Here, pmax = niax(pj). Since the acceptance rate in step (3) is l/(iVpniax) 
(= q), the probability of obtaining X = i eventually is Pi{l —(lY~^q = Pi- 
Notice that the number of iterations is I^J^li r(l — q)''^^q = 1/g = Nptasx. on 
average, and therefore it would take 0{N) time for each generation. Especially, 
the efficiency decreases quite rapidly as the variance of pi increases. One may 
reduce the number of operations down to O(logA^) by employing the binary 
search on the table of cumulative probabilities (see Sec. 2.2). However, there 
exists a further effective method, called "Walker's method of alias" [T3|IT^ . 
which is rejection free and generates a random integer in a constant time. 

The Walker algorithm requires two tables of size N, which need to be calcu- 
lated in advance. One is the table of integral alias numbers {Ai} {1 < Ai < N) 
and the other is that of modified probabilities {Pi} (0 < Pj < 1). Using these 
tables a random integer is generated by the following procedure: 

Walker's Method of Alias 

(1) Generate a uniform integral random variable M (1 < M < A^). 

(2) Generate a uniform real random variable U {0 < U < 1). 

(3) If U is smaller than Pm then X = M, otherwise X = Am- 

This procedure has no iterations, and thus completes in a constant time. The 
meaning of the tables {Ai} and {Pi} and the correctness of the algorithm is 
readily understood with the following example: 



i 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


Pi 





1 

36 


2 

36 


3 

36 


4 

36 


5 

36 


6 

36 


5 

36 


4 
36 


3 

36 


2 

36 


1 

36 


P^ 





1 
3 


2 
3 


3 
3 


3 
3 


2 
3 


2 
3 


1 

3 


2 
3 





2 
3 


1 

3 


Ai 


10 


9 


8 


* 


* 


5 


6 


6 


7 


7 


8 


8 



The modified probabilities Pi are determined frompj (see Appendix B), which 
gives the probabilities whether one should accept the firstly chosen number or 
choose the alias number Ai. Let us consider, for example, the probability of 
X = 9. There are two possibilities: One is M = 9 and U < Pg, and the other 
is M = 2 and U > P2 since A2 = 9. The sum of these two probabilities is 

^[P9 + {l-P2)] = l, (A.l) 

which is equal to pg as expected. One can confirm that {Pi} and {^4,} are 
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given in the example so that 



1 ^ ^ 



p.+5:(i-p,)<5,A, 



(A.2) 



holds for i = 1, 2, ■ • • , A^. Together with the ordinary requirement for proba- 
bilities, < Pj < 1 {i = 1,2, ■ ■ ■ , N), Eq. (lA.2p is the necessary condition for 
{Pi} and {Ai} to satisfy. 

In practice, when N is not a power of two, we expand the size of tables 
from to A^opt, where Aopt is the smallest integer satisfying Aopt > A^- For 
A" + 1 < z < Aopt, we assume = 0. In this way, generating M in step (1) 
is optimized as a bit shift operation on a 32- or 64-bit integral random num- 
ber [14]. Furthermore, steps (2) and (3) can be replaced by a comparison 
between two integral variables by preparing a table of integers {2^^Pi} (or 
{2^^Pj}) instead of floating point numbers {Pi}, by which a costly conversion 
from an integer to a floating point variable can also be avoided. 

In summary, by using the Walker method, integral random numbers according 
to arbitrary probabilities can be generated in a constant time. This extreme 
efficiency is essential for the present 0{N) cluster Monte Carlo method. In 
the next appendix, we describe how to prepare the tables {Pi} and {Ai}. 



B Preparation of Modified Probabilities and Aliases 



In the original paper by Walker [13] and also in the standard literature |14j . 
only a naive 0{N'^) method is presented for initializing {P,} and {^4,}. Here 
we propose for the first time an efficient alternative procedure, which takes 
only 0{N) time. 

Consider the following table of probabilities for {pi}: 
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Here Pj is initially set to a tentative value Npi for i = I,- - ■ ,N. First we 
rearrange the table so that all the elements with Pj > 1 precede those with 
Pi < 1 
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V 



4 5 6 7 8 9 10 



12 11 3 2 



▼ 

1 




The rearrangement can be done by N steps in contrast to the perfect sorting, 
which is an 0{N log N) procedure. The white triangle points to the rightmost 
element with Pi > 1 and the black triangle points to the rightmost element in 
the rearranged table. 



Next, we determine the alias numbers sequentially from the right. We fill 
the "shortfall" {1 — Pi) of the element pointed by the sohd triangle by the 
one pointed by the white triangle. The latter is always large enough, since 
-Pi ^ 1 by definition. In the present example, first the shortfall of the rightmost 
element (1 — Pi) = 1 is filled by the element with i = 10. The alias number 
for i = 1 is then set to 10 and Pio is replaced by Pio — (1 — Pi) = 0. Since 
Pio is no more larger than nor equal to unity, we shift the white triangle to 
the left by one. We repeat the same for the next "unfilled" element. After four 
iterations, the table is transformed as follows: 



V 



4 5 6 7 



p 3 4 5 6 
* 3 3 3 3 



8 9 



▼ 

10 12 

I 



11 3 2 



2 1 

3 3 



1 





8 9 10 



Here the solid and white triangles are shifted four and three times from their 
original positions, respectively. The above procedure is repeated until the black 
triangle points to the same element as the white one, i.e., all the elements get 
filled. One should note that the black triangle always moves by one after each 
iteration, though the white one may stay on the same element depending 
whether P^ > 1 or not after the step. The whole procedure is thus completed 
at most after (N — 1) iterations. In the present example, after 10 iterations 
we end up with 



P^ 
A, 



T 

V 

4 5 



6 7 8 9 



5 6 6 7 



10 12 11 

3 3 

7 8 8 



3 2 



1 

9 10 



which is equivalent to the table presented in Appendix A. 
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